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On the Derivation and Accuracy of Certain Formulas for 

Sample Sizes and Operating Characteristics of 

Nonsequential Sampling Procedures' 

Uttam Chand^ 

Formulas are given that are needed for the computation of number of observations and 
operating characteristics of single sample acceptance sampling procedures based on tests of 
statistical hypotheses. Some of the same results may be obtained by reference to existing 
tables and curves located in widely scattered places. The hypotheses considered concern 
the means and standard deviations of certain populations (that is, binomial, Poisson, normal) 
where the test is made against a one-sided alternative. The comparison of two means or 
two variances as well as the test of a single mean or variance is discussed. The accuracy 
of the formulas is considered, and where approximations are involved, the results are com- 
pared with existing tables. 



Tins paper develops and considers the question of 
the accuracy of certain formulas needed for the 
number of observations and operating: characteristics 
of single-sample acceptance-sampling procedures 
based on statistical tests of hypotheses. For purposes 
of completeness the paper also contains formulas 
that are already known. 

1. Introduction 

It is now common knowledge among users of 
modern statistical tools that the characteristics of a 
sampling plan must be specified in terms of the risks 
of rejecting good material (Type 1 error: producer's 
risk) and accepting poor material (Type II error: 
consumer's risk). The problem of determining a 
sample size for a given probability of Type I error, 
which will minimize the probability of Type II error, 
has been considered by several authors, in particular 
[1 to 4].^ This problem belongs to the category of 
that broad class of problems in the field of planned 
experimentation in which one is asked to provide 
adequate replication to detect treatment differences 
with desired amount of protection against taking 
wrong decisions. If for a certain sample size it is 
impossible to reduce simultaneously to small propor- 
tions the risks of two kinds it will be helpful to know 
this in advance. 

Answers to most of the questions raised in this 
paper can be obtained from the existing published 
tables and curves. The effectiveness of these 
formulas in relation to assumptions and approxima- 
tions that have been made in their derivation also 
has a theoretical interest. We shall restrict ourselves 
to the consideration of certain parametric hypo- 
theses concerning means and standard deviations of 
certain populations, mainly against one-sided alter- 
natives. 

2. Preliminaries and Notation 

In the ensuing sections Hi denotes the mdl 
hypothesis, H2 any one of a set of alternative hypo- 

1 Revision of a paper written duriiifr the summer of 1947 wh(>n the author was 
a guest worker at the National Tiureau of Standards. The manuscript was 
actually revised while the author was teaching at lioston I 'ni versify. 

2 Present address, c/o P. V. Suklmtnie, Indian Council of Agricultural Re- 
search, New Delhi, India. 

3 Figures in brackets indicate the literature references at the end of this paper. 



theses, a the probability of rejecting the null hypo- 
thesis Hi when true, and jS the probability of 
accepting Hi when some alternative hypothesis, 7/2, 
is true. In connection witli the hypotheses concern- 
ing the means of certain populations in which the 
standard deviations are functionally related to the 
means and consequently unspecified, the reader will 
at once recognize that the acceptance-rejection 
criterion A used for a statistic T is not the best in 
the sense of the likelihood ratio test [5]. This 
difficulty, however, can be avoided by transformation 
of the original variables and has been indicated in the 
appropriate sections. In connection with the two- 
sample problem the formulas assume equal sample 
sizes. These formulas can obviously be extended to 
cases in which it is desired to take unequal sample 
sizes of Ni and A^2 that are assumed in advance to be 
functionally related. 

3. A General Formula Concerning Sample 
Size and Region of Rejection 

Let x=x{N) be a normally distributed variable 
with mean jUi and standard deviation (sd) ai = 
J{lJii)F{N) under Hi and with mean /X2 and sd 0-2= 
j{lx2)F{N) under iJ2(M2>Aii), where F{N) is a certain 
function of the sample size A^ and is independent 
of G. We assume x^A as the critical region and 
obtain 



and 



where K, is the standardized normal deviate exceeded 
with pi'obability e and actually 
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Solving (1) and (2) for F{N) and A we obtain 



(2) 
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F{N)= 



Hi—y-i 



K„/(m:) + K^/(m2) 



yl= 



which mav also be written as 



1: 



M2— Ml 



KaCri-}-K0a2 



A= 



KalJL2(^l+K0lJLia2 



Ka(Tl+K^(T2 



(3) 
(4) 

(5) 
(6) 



for later convenience in certain simplifications. We 
note in passing that in particular cases of application 
/(m) will either be a function of /x (cf. binomial and 
Poisson), or it will not depend on /i, in which case it 
will be a certain function of the population sd a 
(cf. normal) or a pure number (cf. transformed 
binomial and Poisson). 

4. Tests Relating to the Parameter of the 
Binomial Distribution 

4.1. Single Binomial 

Consider a random sample of N items drawn from 
an infinite population in which a proportion P of the 
items possesses a definite attribute B and let p be 
the fraction of the items possessing B in the sample. 
Then E{p) = P and a{p) = ^,^'P{l-P)/^/N. Our 
hypotheses are Hi\P=P, and H2:P=P2(P2>Pi)- 
Using the normal approximation for the binomial 
variable p^ (3) and (4) imply 



N^C 



K^^P2(l-P2) + Ka^Pia-P^ 



P2-Pl 



[)Y 



g^PlVP2(l-fi) + gaP2VPl(l-Pl) 

Kay'P,(l-Pi) + K^^P2(l-P2) 



(7) 



(8) 



While (7) determines directly the sample size that 
will (approximately) guarantee a specified a and fi, 
this may also be looked upon as providing the values 
of the probability of accepting Hi for different values 
of P for giveu A" and ex. For example (7) yields 



Kb 



^N{P2-P,)-KjP,{l-P,) 

VA(l-^2) 



(9) 



The inverse sine transformation 6=2 sin"-' -yjp 
where B is measured in radians [161 avoids the diffi- 
culty of the dependence of the standard deviation of 
of p on unknown P, since 6 is approximately normally 
distributed about 2 sin"^ ^|P with sd a(e)^-\''(l/N). 
In terms of the transformed quantities we obtain 



Nc 



4\sin ^ 



K.+Ks 



VA 



-sm 



-1 ./' 



^) 



[^2(- 



K^ s in-S''Pi + g^sin-VP2 ^ 



(10) 



(11) 



K^^2^N{sm-' Vft-sin-^ ^/P^)-Ka^ (12) 

We have derived (7) and (10) to illustrate the use 
of the results given in section 3. The comparison 
of the two formulas raises questions of quite a com- 
plicated nature. We have so far not found any 
convenient yardstick with which to compare their 
accuracy. In the light of the fact that the critical 
region 6^ A (11) has certain theoretical advantages 
against p^A (8) there is the temptation to recom- 
mend (10). As the following example will indicate 
(q:=.05, Pi = 0.1 and P2=0.2) the recommendation 
has nothing to do with the relative magnitude of 
the values of A^ given by (7) and (10). 



^= 


0. 20 


0. 10 


0. 05 


0. 01 


(7) 


68. 9 


101. 2 


132. 6 


202.8 


(10) 


76. 8 


106. 3 


134. 4 


195. 8 



4.2. Comparison of Two Binomials 

Consider two binomial processes with P and P* 
as the values of true proportions and let p and ^* 
be the observed proportions in a sample of N from 
each process. We have here Hi:P*—P=0 and 
H2:P'^—PyO. The test function X=p'^-p under 

Hi has an unspecified variance F(X) = 2[P(1— P)]/A^, 

where P is the common (unspecified) value of P and 
P* under Hi. Under ft we have V{X)=P(l-P)l 
iV+[P*(l-P*)l/iV, where P*-P=J>0 say, with 
the value of ^ specified but not the values of P* and 
P. We are then faced with the problem of com- 
paring two means having unspecified and unequal 
variances under Hi and ft. 

The only satisfactory solution to the problem of 
the comparison of two binomial means — see [4, 
chapter 7} — is usually given in terms of the trans- 
formed variables (section 4.1). Under the inverse 
sine transformation sin~^ ■yjp'^ — sm~^ -y/p is approxi- 
mately normally distributed (except when N is very 
small or the P^s are close to or 1) with mean sin"^ 
VP* — sin~^ VP and variance (1/2A^). We now use 
results of section 3 and obtain 

n4C ^+^^ ,-Y. (13) 



A^ 



V2A^ 



(14) 



K^^^l2Nism-\IP*-sm-'^P)-K„. (15) 

5. Tests Relating to the Parameter of the 
Poisson Distribution 

5.1. Single Poisson 

Let X denote the mean of a random sample of size 
A'^ from a Poisson population with parameter m. 
Let Hi:m=mi and H2:m=m2 (m2>mi). To apply 
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the results of Section 3 we assume normality for x 
with mean m and variance mIN nwd obtain 



nU 



m2—mi 



)' 



A^ 



K.^ 



^JmlKa+^!m2K^ 
(ma— mQVA^— Vmii^g 

V^2 



(16) 



(17) 



(18) 



We notice that under both Hi and i?2 the variance 
of X is known given A^ but differs for Hi and H2. 
If we make use of the well known square-root trans- 
formation [171, the variance of the transformed 
variate is approximately independent of the unknown 
mean and is approximately equal to 1/4A^. We now 
obtain 



and 






(19) 

(20) 

(21) 



For example for />2i = 3.0, m2 = 4.0, a=.05, /S^.IO, 
formulas (16) and (19) yield iV= 29.3 and iV=29.8, 
respectively. This is just an illustration; otherwise 
remarks made in connection with the shigle binomial 
in the last paragraph of section 4.1 apply here as 
well. 

5.2. Comparison of Two Poissons 

Let M and M* be the parameters of two Poisson 
populations. _Let Hi.M=M'' and //2'.M<M*, 
e.g. ^|M'^ — ^lM='qy>Q, Consider two independent 
random samples of size A^ drawn one from each of 
the two populations. Let I and ^* be the corres- 
ponding sample means. We may regard the quan- 
tit3^ -yJx'^ — A.U' as approximately normally distributed 
with mean 7n=^JM* — ^! M and variance 1/2A^ and 
consequently obtain 






A^ 



and 



V2iV 



Ks^\2NUM''-^M)-Ka, 



(22) 



(23) 



(24) 



6. Tests Concerning the Mean of the Normal 
Population 

6.1. Single Mean Test {a known) 

Let X denote the nwiiu of a random sample of TV 
observations from a normal po])ulation (ju, cr^). We 
shall assume that a is known from past experience. 



Our hypotheses are Hi'. m = Mi ^^it^^l Ho: ^1 = ^12 (m2>Mi)- 
Set juo— Mi^'Acr. The test function x is normally 
distributed with mean yL and sd a/-\^N. Using 
residts of Section 3 we obtain 



N 



and 



Ka + K^ 



K,=aJN- K^ 



(25) 



(26) 



(27) 



6.2. Comparison of Two Normal Means {a's known) 

Let Xi, X2 be the sample means of two independent 
random samples of equal size A^ drawn one from each 
of the two normal populations 7ri(jUi,o'i), 7^2(112,(^2) 
respectively. We assume both a'i and al to be known 
from past experience. Our hypotheses are Hi : fjLi = iJi2 
and //o: mi<Cm2. The test function X2 — iFi is normally 
distributed [(/x. — mi), (o-? + o-2)/A^] and, therefore, we 
obtain 



A^ 



< 



d ) 



A= 



Ka{n2 — H\) 



and 



where 






(28) 
(29) 
(30) 



d= 



M2 — Ml 



V''?+ 



<^l 



If <Ti = <ri= (T^ we obtain 



iv=2(^^^±^y 



A= 



and 



KJjl2—jXj) 
Ka + K, 



K3 = A 



Vf- 



■Ka. 



(31) 
(32) 

(33) 



6,3. Single Mean Test {a unknown) 

For notation see section 6.1. In this case the well 
known Student's statistic t= ^JN(x — ijli)/s where 
s^ is the unbiased estimate of o-^ is used to accept or 
reject Hi. Tables for determining the sample size 
for detecting a given value of A with preassigned a 
and 13 are given in [1] and operating characteristic 
curves for the same are given in [2]. To obtain a 
convenient formula for the sample size we proceed 
as follows: 

Consider P{t>k]=P {[(x-^x)-iks/M'N)]>0} 
where A- is a certain constant. The quantity Z= 
{x — iJLi) — {ks/-\fN) consists of tw^o parts: {x — jdi) is 
normally distributed (0, o-^/A^) under Hi, and 



493 



()U2— Ml, o-^A^) under Ih] and for fixed A^, ksJ-y/N is 
a constant multiple of s where s is appro ximately 
normally distributed (Cia, C2(tH2{N-1)) and C^ 
and C2 are certain constants less than 1."^ It will 
be assumed for the purpose of this discussion that 
both Ci and C2 are equal to unity. The dual nature 
of these assumed approximations to the distribution 
of s should be noted. Therefore, 

E{Z\H,)=Z, = -^c 



E{Z\H,)=Z,^{ix, 



■y/N 



V{Z) = 



N ' N2iN-l) 



For Zy-A as the critical region we obtain (see sec- 
tion 3) 



A \'n) 






=K„ 



2iN-l) 
k 



v/A^V 



= -Kb 



1- 



VA^V 2(A^-1) 
Eliminating A from the above two equations we get 

A 



s=:;^V'- 



K.^K, VivV ' 2(JV-1) 



(34) 



where A has been previously defined (section 6.1). 
This is a quadratic in A^ and could be solved for N 
if k were known. We now determine k from the 
consideration that A is to be independent of a. 
After simplification we obtain 



A- 



/ LKg k\ 



The right-hand side will be independent of a if 
and only if the quantity in parentheses vanishes. 

4 The general expressions for C\ and C2 are: 

c,= ^ ^-},_ \^^ > c'2=v2(iv-i)(r-icfy. 



m 



For small values of N, C\, and C2 take the followiDg values: 



N 


Ci 


C2 


2 


0.798 


0.852 


3 


.886 


.927 


4 


.921 


.954 


5 


.940 


.965 


10 


.973 


.985 


25 


.990 


.995 



Therefore, 






(35) 



whence ^=0 and is not only independent of a but 
actually takes the originally intended value in the 
inequality P(Z>>0). Substituting /this value of k 
in (34) we finally obtain 



where 



2(1 



""^{k.+kJ 



(36) 



and b-=(l+a(l+^y^ 

Similarly 

K,= -K.+-^^(N-^-N(l+f)Y' (37) 

If we replace A' — 1 by A", the above two formulas 
reduce to somewhat simpler expressions of the form 



A^= 



<i+i«) 



+1 



K^=-Ka+A 



(^-f)"- 



(36a) 



(37a) 



For any given A, a, and p, values of A^ from (36) 
are compared with the Neyman-Tokarska ^ Tables 
[1] are given in table 1. These values will be found 
to be approximately the same. 

As pointed out in the previous paragraph (34) 
was obtained under the assumption that k is un- 
known. The classical procedure employs k = ta in) 
where 7^= A^— 1 .^ Therefore if the probability points 
of t are not available, (35) furnishes an approxima- 
tion to such points ta. We have considered the ac- 
curacy of such k points in relation to ta in terms of 
P[t^x] = a{x). Values of /«, k, a{ta), and a{k) are 
given in tables 2a and 2b for different A^'s and for 
difl^erent a and /S. Strictly speaking a{ta) = a, but 
when ta to only three decimals is used, a{ta) may 
difl^er slightly from a as shown in tables 2a and 2b. 

We notice that k values are in general conservative 
ta estimators and that the values of a{k) are con- 
sis tenths greater than the corresponding value of 
a(ta). In this sense a user of our formulas is likely 
to declare slightly too many significant results. 
The danger, if it can be so called, is not very great, 
but it is still there. 

The question is asked: is it possible to eliminate 
this ^ ^danger'' and still utilize formula (36) for A^? 
It should be noticed that we can not utilize the 
available percentage points of t in the derivation of 
(36). However, (34) can still be solved for A^ as- 

5 Neyman-Tokarska's p is equal to our A ^JN. Since the standard table of 
probability points of ^, [18] table 4, gives the two-tail probability points of t, our 
ta(n) corresponds to the entry given there for 2a and n degrees of freedom. 
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Table 1. Cnmparii<ons among (Uffcrcni uicl hods for obtaining necessarij sample sizes 

[Single sample one-sided mean test] 



a = .05 


A 


/S=.20 


i8=.10 


/8=.05 




























i\r_Ta 


(36) 


(38) 


(25) 


N-T 


(36) 


(38) 


(25) 


N-T 


(36) 


(38) 


(25) 


2 


3 , 


3 


4 


2 


4 


4 


4 


2 


5 


4 


4 


3 


1 


8 


8 


8 


6 


10 


10 


10 


9 


12 


12 


12 


11 


.5 


26 


26 


26 


25 


36 


36 


36 


34 


45 


45 


45 


43 


.25 


100 


100 


100 


99 


139 


138 


138 


137 


175 


175 


175 


173 


.125 


398 


397 


397 


390 


550 


550 


550 


548 


694 


694 


694 


693 


a = M 


A 


/3=.10 


/3=.05 


/3=.01 


2 


6 


6 


7 


3 


7 


7 


7 


4 


8 


8 


9 


5 


1 


16 


16 


16 


13 


19 


19 


19 


16 


24 


24 


25 


22 


.5 


55 


55 


55 


52 


66 


66 


66 


63 


89 


89 


89 


87 


.25 


211 


211 


211 


208 


255 


255 


255 


252 


349 


349 


349 


346 


.125 


837 


836 


836 


833 


1012 


1012 


1012 


1009 


1388 


1388 


1388 


1388 



a t']\T_q^>f refers to values obtained either directly or by interpolation from Neyman and Tokarska Tables [1]. Values under "(25)" are for <r known and are in- 
cluded here for purposes of comparison with the other three for a unknown. For a relation between tlie comparison of sample sizes for a known and unknown see 
section 6.5. 



Table 2a.* Comparison of the accuracy of the different 
percentage points of t and k values for the same A 



/3=.50 

/3=.20 

/3=.10 

/3=.05 

/3=.01 


a = .05 


Neyman and Tokarska 


This paper 


N 


ta 


a(tc) 


N 


k 


«(A-) 


1 ' 

[ 12 
I 26 

{:/. 
{.I 
{i5 


2.920 
2. 353 
1.796 

2.920 
1.895 
1.708 

2. 353 
1.833 

2.132 
1. 796 

2.015 
1.746 


0. 05000 
. 05000 
. 05003 

. 05000 
. 04996 
.05012 

. 05000 
. 04999 

. 04995 
. 05003 

. 05003 
. 04995 


3 

4 
12 

3 

8 
26 

4 
10 

4 
12 

6 
17 


3.338 
2.110 
1.753 

2. 458 
1.840 

1. 691 

2. 235 
1.783 

2.110 
1.753 

1.959 
1.718 


0. 04958 
. 06270 
. 05370 

. 06659 
.05412 
.05163 

. 05575 
.05414 

. 06270 
. 05370 

. 05370 
. 05250 



• Values of P(t>k)=a(k) and P(t>ta)=a (ta) for N<21 were obtained by 
interpolation in "Student's" Table I [6] and for 22<.V<31 by interpolation in 
Table XXV of [7]. 



suming k to bo known, say equal to fa(^), resulting in 

(38) 



N-- 



_(a + l)+^{a-iy+2at^{n) 



2a 



Values of N from (38) corresponding to different 
ta(n), A, a and p are also given in table 1. Since the 
values of N as given by (36), (38) and the Neyman- 
Tokarska Tables are approximately the same, it is 
recommended that (38) be used in conjunction with 
the ^-tables. This will save the labor of calculating 
k values. 

Since any symmetric two-sided test may be re- 
garded as a combination of two one-sided tests, 
values of N and k may be obtained from the cor- 
responding single-tailed formulas bv substituting 
Ka/2{orKaSindtj2iov /„, in which case the "fi'/ de- 
duced from the resulting vahie of K^ will over-estimate 



Table 2b.* Comparison of the accuracy of the different 
percentage points of t and k values for the same A 



/3 = .50.--- 

/3=.20 

/3=.10 

/3 = .05 

/3 = .01 


a = .01 


Neyman and Tokarska 


This paper 


A^ 


t„ 


<xQ.a) 


N 


k 


cc{k) 


I 24 


4.541 
2.998 
2. 500 

3.747 
2.681 

3.365 
2.602 

3.143 
2.552 

2.998 
2.500 


0. 00997 
. 01003 
.01250 

. 01000 
. 010O2 

. 00996 
. 00996 

.01003 
. 00974 

.01003 
.01250 


5 
8 
24 

() 
13 

6 
16 

7 
19 

8 
24 


4.371 
2.912 
2.473 

3.531 
2.644 

3.277 
2.571 

3.120 

2.528 

2.912 
2.473 


0. 00594 
.01130 
.01322 

. 00833 
. 01070 

. 01099 
. 01064 

.01033 
. 01043 

. 01130 
.01322 



a Values oi P(t>k)=a(k) and P(t>ta)=a (ta) for A^<21 were obtained by 
interpolation in "Student's" Table I [6] and for 22<iV<31 by interpolation in 
Table XXV of [7]. ~ ~ 

the true value of /3, the probability of accepting Hi 
when p,2 — iJLi = Aa, by the amount 1—^^ where ^^ is 
given by (37) or (37a) with the term in A taken 
with a negative sign. Values of A^ obtained from 
[8] and [9] and from the formulas (36) and (38) of this 
paper are given in table 3. The corresponding two- 
tailed values of i^ and k are also given in table 3. 

6.4. Comparison of Two Means: (common a un- 
known) 

For notation refer to section 6.2. Consider two 
samples of equal size N, Our hypotheses are Hi: 
jLii = At2; H2'- mi<Cm2. L^t s^ denote the unbiased esti- 
mate of the common variance a^. The statistic 

X2 — Xi 



4 



2^ 



which unch'r H, lias "Student's" <-distribu- 
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Table 3.* Comparisons among different methods for obtaining necessary sample sizes 
[Two-sided single mean test: a unknown] 



« = .05 


A 


/3=.30 


/3=.20 
























Tabu- 
lated 


(36) 


(38) 


ta/2 


k 


Tabu- 
lated 


(36) 


(38) 


tal2 


k 


2 


4 


4 


4 


3.182 


3.190 


4 


4 


5 


3.182 


2.949 


1 


8 


8 


8 


2.365 


2.280 


10 


10 


10 


2.262 


2.210 


.5 


27 


27 


27 


2.056 


2.038 


33 


33 


33 


2.037 


2.021 


.25 


101 


101 


101 


1.982 


1.979 


128 


128 


128 


1.977 


1.975 


.125 


397 


397 


397 


1.966 


1.965 


504 


504 


504 


1.964 


1.964 


a=.01 


2 


6 


6 


6 


4.032 


4.184 


6 


7 


7 


4.032 


3. 933 


1 


13 


13 


13 


3.055 


3.019 


15 


15 


15 


2.977 


2.941 


.5 


42 


42 


42 


2.701 


2.687 


50 


50 


50 


2.680 


2.668 


.25 


157 


157 


157 


2.607 


2. 604 


190 


190 


190 


2.601 


2.599 


.125 


618 


618 


618 


2.584 


2.583 


751 


751 


751 


2.583 


2.582 



known we determine ki from the consideration that 
A is to be independent of <t. The relation (40) 
yields 



.=VI' 



Ka + K, 

We substitute this value of ki in (39) and obtain 



N= 



b^+^/'bl-8a 



a "Tabulated" values were obtained from [8] and [9] and ta/2 indicates the two-tailed a value of t. See footnote 5. 

tion with 2(A^— 1) d.o.f. is used to accept or reject Hi, 
Consider P( ^>ki )= P(x2 — Ji — /:i-%/-^s>ol where 
we assume ki to be a certain unknown constant. Let 

Z=(x2 — Xi)-kiJ-^s. 

We assume that 8 is approximately normally dis- 
tributed with mean a and sd, o-/V4(A^— 1). There- 
fore Z is approximately normally distributed with 

a{Z) = (T^j, 



(42) 



2a 



(43) 



k\ 



N ' 2N(N-l) 



where (t{Z) denotes the standard deviation of Z. 
Proceeding as in section 3.3 we obtain 



and 



A 



=Vi 



kt 



N ' 2iV(A^-l) 
K„<tA / 2 



Ka + K» 



k,. 



(39) 



(40) 



If we assume ki=ta{n) where ta is the one-sided 
a-point of t for n=2 (N—l) d.o.f the equation (39) 
yields 



N^ 



a + 2 + -y/(a-2Y+2atl(n) 
2a 



(41) 



where a has been defined in section 6.3. The reason 
for such an illogical assumption about the knowl- 
edge of ta{n) before actually N is determined has 
been indicated in the previous section. If ^i is un- 



where a has been previously defined (section 6.3) and 

h = 2+a(^l+^y 

For dotei-mining the operating characteristic we 
similarlv obtain 



Ks= 



-^"+^[?(^-4(fe))r (^4) 



We give in table 4 values of A^ as obtained from 
(43) and Tables of Neyman ^ and Tokaiska [1]. 
While (42) provides approximate values of the p(M-- 

Table 4. Comparisons among different methods for obtaining 
necessary sample sizes N^ 

[Two means one-sided test; common <r unknown: A = 2] 





a =.05 


a=.01 


Neyman-Tokarska 


Neyman-Tokarska 


/3=.50 


(43) 


(43) 


2 


2 


4 


4 


/3=.20 


4 


4 


6 


7 


/3=.10 


5 


5 


8 


8 


/3=.05 


6 


6 


9 


9 


/3=.01 


8 


9 


12 


12 



a iV is the size of one of the two equal samples. 



6 For this particular case p as defined in [1] is equal to A 



V?- 



496 



contage points of t for 2(A^— 1) d.o.f, the formula (43) 
may bo used advantageously by utilizing the avail- 
able percentage points of t as accept anc(^-rej(H-t ion 
criteria. Formulas obtained here for single-tailed 
comparisons can also be used for a tvvo-tail(Ml test 
by substituting appropriate two-tailed values of the 
quantities involved. If the two po])ulations have 
unequal variances and their ratio is known we still 
can construct a test function similar to t and use the 
above formulas. (For the structure of the test 
function see [10].) 

6.5. Comparison of Sample Sizes for Known and 
Unknown a 

It need hardly be emphasized that in situations in 
which large sample sizes are required, the normal 
test (sections 6.1 and 6.2) and /-test (sections 6.3 
and 6.4) will both yield approximately the same A^ 
(see for example table 1). To determine the rela- 
tions between the two, let A^^„ be the number of 
observations reciuired when a is known and A^^ the 
corresponding number when a is not known. After 
some simplification the following asymptotic rela- 
tions between Nm and A^^ ai'e obtained. For the 
single-mean test, (25) and (36a) yiekl 



A^-A^. 1 



('+«;>' 



(45) 



and for the two-means test, (31) and (43) yield 

A',~iV.(l+^3+l. (46) 

7. Tests Concerning Variances of Normal 
Populations 

Tests relating to population variances fall into two 
well defined categories. In one case we assume that 
the variability of a certain product is known and it 
is desired to find out whether a new product is more 
variable than this. In the other case we are asked 
to choose between two products on the basis of their 
variability which is unknown. We discuss these 
situations in the following sections. 

7.1 Single Variance Test 

Consider a random sample of size A^ (=n-\-l) from 
a normal population (m, o-^). Let s- he the unbiased 
sample estimate of o-^. Our hypotheses are Hi: 
a^=al and Ho: (t'- = \(tI{\^1). For a given level of 
significance a, if x^=^^^l(^o^^l, we reject Hi and 
conclude that a-^al. 

Let \{a,l3,ny denote the value of a^/ol for which 
the prohability of the decision (T^=ao equals ^ when 
the test is conducted at the a level of significance 
with n d.o.f . It can be shown [2], [4] that the proba- 
bility error of the second kind is exactly ^ if \{a,^,n) = 
Xa{n)lxi-^{n)). If we are testing Hi. (T^=ao against 
Ho: a'^Xal (X<1) we have X(q:,/3,7i) = x?_«/x^. 



^ Our \(a,fi,n) is equivalent to pia,fi,n) of [4], 
970822—52 5 



Curves for the operating characteristics of such 
testing procedures are given in [2] and [4]. Eisenhart 
[4] has also given extensive tables for \{a,0,n). 

The problem of determining a direct relation 
between large n and any given set of values of X, a, 
was first consi(hM-e(l by Wallis [1, footnote of p. 
278]. Assuming normaUty of s (see section 6.3) and 
applying the results of Section 3 we obtain 



71 = 



iC^')' <") 



A^- 



^XaoiK„ + K») 



Ka+y/\K^ 



(48) 



Ha,0,n)J^pl^y (49) 

\^J2n — Kj3/ 

To compare the accuracy of this formula with the 
Tables [4] consider the following situation: if a deci- 
sion (7 = 0-0 is a serious error from the practical view- 
point when 0-^ 1.500 o-q and it is ch^sired to keep the 
risk of such an error below .05 whc^n the test is con- 
ducted at the 5-percent level of significance, how 
many d.o.f. will be needed for .s-? The formula (47) 
gives 71=33.8 and from [4] we find that 34 d.o.f. are 
needed. Table 5 presents the calculated values of 
n from (47) and table 6 presents the calculated 
values of X from (49). For a comj)ai-ative discussion 
on the use of these formulas in iH^ation to others see 
next section. 

7.2. An Alternative Formula for the Single Variance 
Test Based on the Distribution of Log s^ 

As pointed out by Bartlett and Ivencbdl [11] the 
distribution of log -s*^' depends on o-- only through the 
term a^ in its expected value. Consequently the 
choice of the critical region based on the distribution 
of log s in place of .9 has obvious advantages. In this 
section we explore the possibility of using some 
formulas based on the distribution of log .v. 

The cumulant function K(/) of log .s- is given [11] 

by 

K(0=f (log .^-log ^)+log r ("4^)-log r (I) 

which yields the following expressions for the first 
two cumulants 

'C:=|(log.^-log^)+|*(|) 



K2 



where ^(x)=-i— log r(^). The results of Section 3 



dx 



' All logarithms are to the base «... 
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applied to log s yields 
K2(n) = \ 






(50) 



(51) 



X(a,/3,n) = e ^ , 



(52) 



These formulas assume no other approximation 
except that of normality and in that sense may be 
regarded as exact relative to (47), which assumes 
dual type of approximation for the distribution of s 
(see section 6.3). While the accuracy of (50) and 
(52) does not depend upon the accuracy with which 
we estimate ki, it does depend upon the compli- 
cated expression ^' f — V For a given a, 13, and X the 

only way in which we can utilize (50) for finding 
the necessary sample size is to approximate the 

Table 5a. Comparison of the sample sizes {n= N—1) for the 
single variance test 



a =.05 


X 


/3=.05 


/3=.10 


(47) 


(53) 


Tabu- 
lated 


(47) 


(53) 


Tabu- 
lated 


2 

2.5 
3 
3.5 


46.0 
26.7 
18.8 
14.7 


46.1 
26.8 
18.9 
14.8 


46.4 
26.6 
18.8 
14.6 


34.9 
20.0 
13.9 
10.8 


36.7 
21.4 
15.2 
11.9 


36.0 
20.5 
14.5 
11.1 



Table 5b. Comparison of the sample sizes {n=N— 1) for the 
single variance test 



OC=M 


X 


/S=.01 


/3=.05 


(47) 


(53) 


Tabu- 
lated 


(47) 


(53) 


Tabu- 
lated 


2 

2.5 
3 
3.5 


91.9 
53.4 
37.7 
29.4 


91.1 
52.6 
36.9 
28.6 


91,4 
53.1 
37.6 
28.7 


63.1 
35.9 
25.0 
19.3 


66.6 
38.6 
27.1 
21.1 


64.3 
37.2 
25.6 
19.8 



asymptotic expansion of ^M — ) [12]. 

As a first approximation if we set K2^1/[2{n- 
we obtain 



-1)] 



^og\=^:^(K^+K,). (54) 

In tables 5a and 5b values of n from [4], and values 
calculated from (47) and (53) for different values 
of a, P, and X are compared. In table 6 we give 
values of \{a, /3, n) to reemphasize the nature of 
approximations based on the distribution of s and 
log s. In this connection for the application of (52) 
values of K2{n) were taken from [11]. 

It appears that for the customary values of the 
probabilities of errors of the two kinds q:=j3==.05 
and a=p=.01, formulas (47) and (53) provide very 
good approximations to n for small sample sizes. 
If the percentage points of the x^-distribution are 
available, (47) is preferable because it is easier to 
compute n from (47) than from (53). Even for such 
a small value as n=5, (47) errs on the safe side in 
this sense that it gives (at least for a=/3) a sample 
size which will be always sufficient to detect this 
difference. The formula (53) also shares this prop- 
erty with (47). In the absence of the percentage 
points of the x^-distribution it perhaps ought to be 
emphasized that on comparison of the critical regions 
for s and log s (cf. (48) and (51)) there is not much 
basis for choice. The choice of the critical region 
based on the distribution of log s has certain theo- 
retical advantages, but the computation of the criti- 
cal region is somewhat more complicated since it 
involves the approximation of ^(7i/2). 

The effectiveness of formulas (47) and (53) varies 
when a. and /3 are not equal. It appears (see table 
5 and table 6) that for /3>q: it is safer to use (53) 
because it is always likely to err on the safe side in 
the sense of the previous paragraph. However, if 
^<Ca it appears that it is safer to use (47) because 
(53) is likely to give a value of n, which will actually 
be less than the necessary sample size. 

7.3. Comparison of Two Population Variances 

Let 0"? and al denote the variances of the two 
normal populations and let ^i and ^2 be their inde- 









Table 6. 


Comparison of the tabulated and calculated values of X (a, 


^,n) 






a = .05 


n 




/3= 


.25 




/S=.05 




fi= 


.01 




(49) 


(52) 


(54) 


Tabulated 


(49) 


(52) 


(54) 


Tabulated 


(49) 


(52) 


(54) 


Tabulated 


5 
10 
15 
20 


3.734 
2.595 
2.199 
1.990 


5.073 
2.977 
2.401 
2.122 


5.155 
2.984 
2.403 
2.122 


4.139 
2.717 
2.265 
2.033 


10. 037 
4.681 
3.454 
2. 900 


10.01 
4.700 
3.463 
2.906 


10. 239 
4.715 
3. 468 
2.907 


9.664 
4.646 
3.442 
2.895 


33. 065 
8.126 
5.109 
3.973 


16.14 
6.476 
4.479 
3.625 


16. 577 
6.501 
4.487 
3.627 


19. 972 
7.156 
4.780 
3.802 
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pendent sample estimates based on Ui and 712 d.o.f. 
respectively. Our hypotheses are IIi'. ai=(Tl and 
H2: o-i>(7i. The statistic F=s'ilsi is used to accept 
or reject Hi. For a given level of significance a, 
if t <Fa we accept Hi and accept H2 when F^Fa. 
Let </) denote the true value of ai/ai. It has been 
shown in [2] and [4] that the probability of an error 
of the second kind will be exactly fi if 



(j)(a, ^, 711,712)- 



Fa{7li,7l2) 



' Fi. ^{711,712) 



= Fa{7iu7i2)Fp{7i2,n:). (55) 



The operating characteristics (</>, /3) for one-sided 
alternatives have been given in [2] and [4]. We shall 
develop here certain approximate formulas for cf) 
in terms of a, fi, Ui, and 712. 

By taking one-half the logarithm of (55) we obtain 



2 log (t>=Za{ni,n2) + 20{n2,7ii) 



(56) 



where 2a denotes the a;-probability point of Fisher's 
^-distribution. For purposes of approximation, (56) 
has decided advantage over (55) in that it enables 
us to make better use of the Cornish-Fisher approxi- 
mation [13] since the 2;-distribution approaches 
normality relatively faster than the F-distribution. 
We shall employ the following approximations [14] 
for the cumulants of z: 

Let Zi and 02 denote the mathematical ex- 
pectation of z under Hi and H2, respectively. 
Consequently we have 

^"~2V^2 nj 



^2 



2 \7i2 nj 



1 



~ — :^)+o'^g* 



"^-K^a 



By applying the results of section 3 to ^, it is found 
that 

which may be rewritten as 

2{Ka + K,) 



log (t>=- 



-y/h 



(58) 



1 z 2nin2 -D • /I I 1 \ 

where h = j^^ Bv usmg k2=( v + - 7 ) 

ni+7i2 " ^ \.^2— 1 ni — ly 



ni-\-7i2 
there results 



^\ni-1^7i2-lj \Ka + Kj 



(59) 



Du-ectly applying to (56) the Cornish-Fisher ap- 
proximation fl3] for the percentage points of the 
^-distribution in conjunction with Cochran's X-cor- 
rections [15], we obtain (58) and the following two 
expressions for log : 



log 0=2 



log = 2 









^Jh — \a -yjh- 
where X^ is given by 



:)+s(s-i)"<«-^'' 

(61) 



e 

Xe 



.75 

. 5758 



.50 
.5000 



25 
5758 



.05 
.9509 



.01 
1.4020 



Formulas (58), (59), and (60) are not changed when 
Ui and 712 are interchanged. Formulas (60) and (61) 
are identical when either a = ^ or 711 = 712, or both. 

Table 7a.* Comparison of the calculated (formula 61) and 
tabulated values of (p (a, ^, ui, 712) 









a = 


= .05, /3=.05 








\ 
















\ n, 

«2 \ 




5 


10 


15 


20 


30 


60 


\^ 
















5 




26.312 


15. 674 


13. 082 


11.918 


10. 844 


9.839 


25.51 


15.75 


13.40 


12.36 


11.39 


10.49 


10 




15. 674 


8.910 


7.240 


6. 480 


5.787 


5. 130 


15.75 


8.870 


7.237 


6. 513 


5.844 


5. 223 


15 




13. 082 


7. 240 


5. 787 


5. 130 


4.516 


3. 937 


13. 40 


7.237 


5.777 


5.128 


4.527 


3.967 


20 




11.918 


6. 486 


5. 130 


4. 516 


3. 937 


3.390 


12.36 


6.513 


5.128 


4.512 


3.939 


3.402 


30 




10. 844 


5.787 


4.516 


3. 937 


3.390 


2. 866 


11.39 


5.844 


4.527 


3.939 


3.389 


2.869 


60 




9.839 


5.130 


3. 937 


3.390 


2.866 


2.355 


10.49 


5.223 


3.967 


3.402 


2.869 


2.354 



a Figures in bold face type are taken from [4]. 

Table 7b,* Comparison of calculated (formula 61) 
tabulated values of <p (a, (3, ni, ^2) 



and 









a = 


.01,/S=.01 








\ 
















n2 \ 




5 


10 


15 


20 


30 


60 


\. 
















5 




135. 047 


57. 720 


43. 294 


37. 428 


32. 304 


27.816 


120.3 


56.65 


44.29 


39.19 


34.69 


30.72 


10 




57. 720 


23. 893 


17. 434 


14. 791 


12. 474 


10. 436 


58.65 


23.51 


17.35 


14.84 


12.65 


10.74 


15 




43. 294 


17. 434 


12.474 


10. 436 


8. 650 


7.082 


44.29 


17.34 


12.41 


10.41 


8.679 


7.168 


20 




37. 428 


14. 791 


10. 436 


8. 650 


7. 082 


5. 697 


39.19 


14.84 


10.41 


8. 630 


7.082 


5.731 


30 




32. 304 


12. 474 


8.650 


7.082 


5.697 


4.471 


34.69 


12.65 


8.679 


7.C82 


5. 693 


4.479 


60 




27. 816 


10. 436 


7.082 


5. 697 


4.471 


3. 372 


30.72 


10.74 


7.168 


5. 731 


4.479 


3.372 



' Figures in bold face type are taken from [4]. 
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Table 7c. Comparison of calculated {formula 61) 
tabulated values of tp (a, /3, ni, 712) 



and 









a = 


.05, |3=.25 








\ 
















712 \ 




5 


10 


15 


20 


30 


60 


\ 
















5 




9.742 


7.371 


6.674 


6.338 


6.012 


5. 693 


9.569 


7.507 


6.900 


6.609 


6.327 


6.052 


10 




6.345 


4.632 


4.112 


3. 837 


3.607 


3. 359 


6.285 


4.620 


4.123 


3.882 


3.646 


3.614 


15 




5. 464 


3.192 


3.432 


3.195 


2.959 


2.722 


5.469 


3.902 


3.428 


3.197 


2.969 


2.741 


20 




5.061 


3.578 


3.116 


2.886 


2.655 


2.421 


5.102 


3.577 


3.113 


2.885 


2.658 


2.429 


30 




4.682 


3.264 


2.815 


2.590 


2.361 


2. 125 


4.759 


3.273 


2.815 


2.589 


2.361 


2.127 


60 




4.325 


2.964 


2.526 


2.303 


2.072 


1.828 


4.439 


2.987 


2.533 


2.306 


2.073 


1.828 



Tabl^ 8. Values of <p (a, ^, Ui, Ui) from log (p = 







a = 


= .05,/3=.05 








\ 














712 \ 


5 


10 


15 


20 


30 


60 
















5 


26.83 


16.37 


13.98 


12.93 


11.96 


11.06 


10 


16.37 


8.963 


7.299 


6.570 


5.901 


5.284 


15 


13.98 


7.299 


5. 803 


5.148 


4.545 


3.987 


20 


12. 93 


6.570 


5. 148 


4.524 


3. 948 


3.412 


30 


11.96 


5.901 


4.545 


3.948 


3.393 


2.872 


60 


11.06 


5.284 


3.987 


3.393 


2.872 


2.355 






a 


= .01, j8=.0 


1 






5 


104.9 


52.15 


41.70 


37.33 


33.43 


29. 95 


10 


52. 15 


22.24 


16.63 


14.33 


12.31 


10. 53 


15 


41.70 


16.63 


12.02 


10.15 


8.511 


7.072 


20 


37. 33 


14. 33 


10.15 


8. 455 


6.973 


5.673 


30 


33. 43 


12.31 


8.511 


6.973 


5.629 


4. 447 


60 


29.95 


10.53 


7.072 


5.673 


4.447 


3. 358 








x=.05, /3=. 


25 






5 


10.17 


7.178 


6.421 


6.076 


5. 752 


5. 445 


10 


7.178 


4.694 


4.061 


3.771 


3. 496 


3. 234 


15 


6.421 


4.061 


3.455 


3.175 


2.908 


2. 651 


20 


6.076 


3.771 


3.175 


2.898 


2.633 


2. 376 


30 


5.752 


3.496 


2.908 


2.633 


2.366 


2.104 


60 


5.445 


3.234 


2.651 


2.376 


2.104 


1.829 



It appears on the basis of several computations 
(not given here) that formula (58) is likely to give 
values of that are much lower than its tabulated 
values [41, and consequently the sample sizes given 
by it will fall below the minimum desired. For 
a=j8 formula (61), then equivalent to (60), gives 
values of <t> which are much closer to its tabulated 
values (table 7). This is also true of (61) for fi^^a. 
If in addition to a=fi, ni=n2=n, then from (60) 



-^-HW)' 



(62) 



We have not found any formula which will give 
an approximately correct answer for degrees of 
freedom as small as, say, 711=^2= 5. The question 
of finding appropriate sample sizes for ni9^n2 cannot 
ordinarily be answered without the help of tables. 



In such a situation, however, the experimentalist 
has no choice in the determination of rii and ?i2. If 
it is decided in advance to maintain a certain ratio 
between Ui and 712 (this appears to be more often 
the case in practice) a formula would be more 
practical to use than the existing tables. Since 
formula (61) seems to be very complicated to use, 
we recommend the use of (59) . Values of as given 
by this formula are given in table 8. It appears 
that for a:=:j8=.05, formula (59) will aWays give 
sufficient sample sizes; but for a=^=.Ol, it will 
give values slightly less than actually needed. 



This paper could not have been written without 
the constant encouragement of Churchill Eisenhart. 
The author acknowledges with pleasure the help 
given by Lola S. Deming and Celia S. Martin for 
doing the enormous computations. Thanks are also 
due to Elizabeth Shuhany of the Statistical Labora- 
tory, Boston University, for some computational help. 
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9. Appendix. Tabular summary of single-sampling formulas for testing hypotheses 



fnfjx? ^^«^"^ «^ population „^ji hypothesis 



O 



> 
W 

t-H 

o 

H 
O 

CD 

a- 

i-i 



r.l 



6.1 



G.2 



6.3 



6.4 



7.1 



7.2 
7.3 



7.3 



7.3 

CD 

C71 



Single binomial, mean P 
Sample mean p 



Two binomials, means 

P and P* 
Sample means p and p* 

Single Poisson, mean m 
Sample mean x 

Two Poissons, means M 

and M* 
Sample means x and x* 

Single normal, mean fi 



Two normals, means ^i 
and /JL2 

Two normals, means fi^ 
and ^2 



Single normal, mean ju; 

variance a^ 
§2= estimate of <r2 
(unknown) 



Two normals, means n^ 
and /i2 



Single normal, variance 
52= estimate of <t- 



do 

Two normals, variances 

0-2 and <tI 
«^and «^=estimates of(rj2 

and <t2 



do 



do 



II2 
alternative hypothesis 



P=P, 



P=P* 



(r2=a2 
(known) 

(both known) 

<r2=o-2=(r2 

(known) 



do 



do 



do 



do 



P=P, 

{P,>P,) 

P<P* 



(W2>w,) 
M<M* 

(M2>M,) 

cr2=o-o^ (known) 

Mi<M2 

<r2^a2 

(both known) 

M,<M2 

(known) 



(/i2>M,) 



M,<A'2 



<r2 = X(T2 
(X>1) 



<^?><^; 



do 



do 



Criterion for rejecting IIi 



e>2 



K^ sin-i V^i + Ka sin-i 7^2 



where 6=2 sin-' >^ 

V2A^ 
where 9' = sin-i Vp*— sin-^ -y^ 
Ka^lrn^-\-K^^[m[ 



) 



V^>- 



^2-^l> 



-^a(M2 — Ml) 
-?^«(M2-Atl) 



(f-/ii)-/c-4=>o, 

Viv 



where /;: 



a'^aVa^ 



where fc 



"V 2 ' A-^+A^ 



VXao(A-^+Ag) 

'^ ^a+VX Ka 

Kfi log <r 0+ Aa log Vx q-Q 
^*> Ap + /^3 

do 



A« 



' si Tij «j AT^+A^ 



do 



log <t> 



do 



Formula for determining AT 



4\sin-i VP2-sin-i ^/p,y 

kJ{ ^+^« Y 

2\sin-i V-P*-sin-i /P/ 

n4( 1»+^^ Y 

^\V"»2— Vwi/ 

^. i/^+£^Y 

\^/M*-^/MJ 

..(ii±£,)- 
„.(ii±S)- 

^^ 6+V62-4a 

A^= 5 or 

2a 

a+l+V(a-l)2+2a< 
2a 

A'=*rf^, or 
2a 

0+2+ V(Q-2)2+2a^^(w) 
2a 

,,(,0=1 — ^ ) 

\A«+i^^/ 

log <,!.= ( A«+A^) 

log0=2( -=^+-=== j 



Formula for determining K^ 



A'^ = 2 VAr(sin-i ^JP.2-sm-^ ^/P^) -A« 

A'^= V2iV(sin-i V^-sin-i ^P) - K^ 

A3 = 2 V^ ( Vw2 - Vw,) - iir« 

K^ = V2iV ( V AT* - V^) - Ka 
K^ = ^^|N-K^ 

/A„+V2wV 

X(«,^,W)=( " ^ ) 

\^j2n-K^J 



2^K^(n) (K^ + K^) 



\(a,ft,n)=e 



logX=-^^-l-(A„+A>) 



Notes 



d=- 



V<r2 + a| 



,=i+.(,+^) 

"=(a,+ a>) 

^ = 2+a(l + -^^) 
n=A^-l 

(7- 

X=— 



/i = 



2njn2 



